16 research outputs found

    Analysis of deep stress field using well log and wellbore breakout data: a case study in Cretaceous oil reservoir, southwest Iran

    Get PDF
    To identify the wellbore instability of Bangestan oil reservoir in the southwestern Iran, the direction and magnitude of stresses were determined using two different methods in this study. Results of injection test and analysis of wellbore breakouts were used to verify the accuracy of the stress profiles. In this study the Bartoon method, which using the breakout angle and strength of rock, was used. In addition, the ability of artificial neural network to estimate the elastic parameters of rock and stress field was used. The output of the neural network represents a high accuracy in the estimation of the desired parameters. In addition, the Mohr-Coulomb failure criterion was used to verify stress profiles. Estimated stresses show relative compliance with the results of injection test and Barton method. The required minimum mud pressure for preventing shear failures was calculated by using the Mohr-Coulomb failure criterion and the estimated stress profiles. The results showed a good compliance with failures which have been identified in the caliper and image logs. However, a number of noncompliance is observed in some depth. This is due to the concentration of fractures, collisions between the drill string and the wellbore wall, as well as swab and surge pressures. The stress mode is normal and strike-slip in some depth based on the estimated stress profiles. According to direction of breakouts which is clearly visible in the caliper and image logs, the minimum and maximum horizontal stresses directions were NW-SE and NE-SW, respectively. Thses directions were consistent with the direction of regional stresses in the Zagros belt

    Cross-domain Voice Activity Detection with Self-Supervised Representations

    Full text link
    Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which is a necessary first step for many today's speech based applications. Current state-of-the-art methods focus on training a neural network exploiting features directly contained in the acoustics, such as Mel Filter Banks (MFBs). Such methods therefore require an extra normalisation step to adapt to a new domain where the acoustics is impacted, which can be simply due to a change of speaker, microphone, or environment. In addition, this normalisation step is usually a rather rudimentary method that has certain limitations, such as being highly susceptible to the amount of data available for the new domain. Here, we exploited the crowd-sourced Common Voice (CV) corpus to show that representations based on Self-Supervised Learning (SSL) can adapt well to different domains, because they are computed with contextualised representations of speech across multiple domains. SSL representations also achieve better results than systems based on hand-crafted representations (MFBs), and off-the-shelf VADs, with significant improvement in cross-domain settings

    AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition

    Get PDF
    The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the health and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of various approaches to health and emotion recognition from real-life data. This paper presents the major novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline systems on the three proposed tasks: state-of-mind recognition, depression assessment with AI, and cross-cultural affect sensing, respectively

    LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Full text link
    Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also focus on speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech.Comment: Will be presented at Interspeech 202

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Full text link
    Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe

    On the evolution of speech representations for affective computing A brief history and critical overview

    No full text
    Recent advances in the field of machine learning have shown great potential for the automatic recognition of apparent human emotions. In the era of Internet of Things and big-data processing, where voice-based systems are well established, opportunities to leverage cutting-edge technologies to develop personalized and human-centered services are genuinely real, with a growing demand in many areas such as education, health, well-being, and entertainment. Automatic emotion recognition from speech, which is a key element for developing personalized and human-centered services, has reached a degree of maturity that makes it of broad commercial interest today. However, there are still major limiting factors that prevent a broad applicability of emotion recognition technology. For example, one open challenge is the poor generalization capabilities of currently used feature extraction techniques to interpret expressions of affect across different persons, contexts, cultures, and languages

    *EP004198978A1*

    No full text
    The invention relates to a computer implemented method for real-time emotion recognition from a real-time audio signal. The method includes transcribing, into text, an audio speech signal contained in the audio signal by an automatic speech recognition model, and computing, by a speech representation model, a joint representation vector corresponding to a joint representation of the speech as a function of the speech signal and the text. The method also include computing, by an emotion prediction model, an emotion embedding vector as a function of the joint representation vector, and mapping the emotion in at least one emotional frame, according to the emotion embedding vector, by an emotion mapping model. The invention further relates to a computer program and a device implementing such a method

    Reconnaissance d'affects multi-corpus avec Emotion Embeddings et représentations auto-supervisées de la parole

    No full text
    International audienceSpeech emotion recognition systems use data-driven machine learning techniques that rely on annotated corpora. To achieve a usable performance in real-life, we need to exploit multiple different datasets since each one can shed the light on some specific expression of affect. However, different corpora use subjectively defined annotation schemes, which poses a challenge to train a model that can sense similar emotions across different corpora. Here, we propose a method that can relate similar emotions across corpora without being explicitly trained for it. Our method relies on self-supervised representations, which can provide us with highly contextualised speech representations, and multi-task learning paradigms. This allows to train on different corpora without changing their labelling schemes. The results show that by fine-tuning self-supervised representations on each corpus separately, we can significantly improve the state of the art within-corpus performance. We further demonstrate that by using multiple corpora during the training of the same model, we can improve the cross-corpus performance, and show that our emotion embeddings can effectively recognise the same emotions across different corpora.Les systèmes de reconnaissance vocale des émotions utilisent des techniques d'apprentissage automatique basées sur des données qui se fondent sur des corpus annotés. Pour obtenir une performance utilisable dans la vie réelle, nous devons exploiter plusieurs ensembles de données différents, car chacun d'entre eux peut mettre en lumière une expression spécifique de l'affect. Cependant, les différents corpus utilisent des schémas d'annotation définis de manière subjective, ce qui pose un problème pour former un modèle capable de détecter des émotions similaires dans différents corpus. Nous proposons ici une méthode qui permet d'établir des liens entre des émotions similaires dans différents corpus sans qu'il soit nécessaire de s'entraîner explicitement à cette fin. Notre méthode s'appuie sur des représentations auto-supervisées, qui peuvent nous fournir des représentations vocales hautement contextualisées, et sur des paradigmes d'apprentissage multi-tâches. Cela permet de s'entraîner sur différents corpus sans changer leurs schémas d'étiquetage. Les résultats montrent qu'en affinant les représentations auto-supervisées sur chaque corpus séparément, nous pouvons améliorer de manière significative les performances de l'état de l'art au sein du corpus. Nous démontrons également qu'en utilisant plusieurs corpus pendant l'entraînement du même modèle, nous pouvons améliorer la performance inter-corpus, et nous montrons que nos représentations d'émotions peuvent reconnaître efficacement les mêmes émotions dans différents corpus

    LeBenchmark, un référentiel d'évaluation pour le français oral *

    No full text
    International audienceL'apprentissage autosupervisé a apporté des améliorations remarquables dans de nombreux domaines tels que la vision par ordinateur ou le traitement de la langue et de la parole, en exploitant de grandes quantités de données non étiquetées. Dans le contexte spécifique de la parole, cependant, et malgré des résultats prometteurs, il existe un manque évident de normalisation dans les processus d'évaluation permettant des comparaisons précises de ces modèles, en particulier pour les autres langues que l'anglais. Nous présentons ici à la communauté francophone LeBenchmark, un cadre de référence en sources ouvertes et reproductible pour évaluer des modèles autosupervisés à partir de corpus de parole en français. Il est composé de quatre tâches : reconnaissance automatique de la parole, compréhension du langage parlé, traduction automatique de la parole et reconnaissance automatique d'émotions. Nous encourageons la communauté francophone à utiliser ce référentiel dans ses futures expérimentations, notamment pour l'évaluation de modèles autosupervisés
    corecore